Welcome back to Deep Learning. So today I want to talk to you about a couple of advanced topics,
in particular looking into sparse annotations. So we know that data quality and annotations
is extremely costly and in the next couple of videos we want to talk about some ideas
on how to save annotations. So the topics will be weekly supervised learning and self-supervised
learning. Okay, so let's look at our slides and see what I have for you. So the topic
weekly and self-supervised learning and we start today with looking into limited annotations
and some definitions. Later we will look into self-supervised learning for representation
learning. So what's the problem with learning with limited annotations? Well, so far we
had the supervised learning and we've seen these impressive results achieved with large
amounts of training data, consistent high quality annotations. So here you see some
example we had annotations for instant-based segmentation and there we had simply the assumption
that all of these annotations are there, we can use them and they are maybe even publicly
available so it's no big deal. But actually that's in most cases not actually true. So
typically you have to annotate and annotation is very costly. So if you look at image level
class labels you will spend approximately 20 seconds per sample. So here you can see
for example the image with dog. There's also ideas where we try to make it faster for example
by instance spotting that you can see here in reference 11. If you then go to instance
segmentation then you actually have to draw outlines and that's at least 80 seconds per
annotation that you have to spend here. And if you go ahead to dense pixel level annotations
you can easily spend one and a half hours for annotating an image like this one. So
you can see that in reference four. Now the difference between weakly supervised learning
and strongly you can see in this graph. So here you see that if we have image labels
of course we can classify image labels and train that and that would be essentially supervised
learning, training of bounding boxes to predict bounding boxes and training with pixel labels
to predict pixel labels. Of course you could also abstract from pixel labels to bounding
boxes or from bounding boxes to image labels and that all would be strong supervision.
Now the idea of weakly supervised is that you start with image labels and go to bounding
boxes or you start with bounding boxes and try to predict pixel labels. So this is the
key idea in weakly supervised learning that you somehow want to use as sparse a few annotation
example and then create much more powerful predictors. So the key ingredients for weakly
supervised learning that you use priors, you use explicit and implicit priors about shape
and size, contrast, also motion can be used for example to shift bounding boxes, the class
distributions some classes are much more frequent than others and similarity across images.
Of course you can also use hints like image labels, bounding boxes, image captions can
be used as weakly supervised labels, sparse temporal labels that are then propagated over
time, scribbles or clicks inside objects and here are a couple of examples of such sparse
annotations for scribbles and clicks. And there are some general approaches, one from
labels to localization would be that you use a pre-trained classification network and then
for example you can use tricks like in the lecture on visualization that you produce
a qualitative segmentation map. So here we had this idea of back propagating the class
label into the image domain in order to produce such labels. Now the problem is that this
classifier was never trained for localized decisions and the second problem is good classifiers
don't automatically yield good maps. So let's look into another idea and the key idea here
is to use global average pooling. So let's rethink about the fully convolutional networks
and what we've been doing there. You remember that we can replace fully connected layers
that have only a fixed input size by mtn convolution. And if you do so, you see that if we have
some input image and we convolve with a tensor, then essentially we get one output. Now if
we have multiple of those tensors, then we would essentially get multiple channels. And
if we now start moving our convolution masks across the image domain, you can see that
if we have a larger input image, then also our outputs will grow with respect to the
Presenters
Zugänglich über
Offener Zugang
Dauer
00:12:13 Min
Aufnahmedatum
2020-10-12
Hochgeladen am
2020-10-12 22:36:20
Sprache
en-US
Deep Learning - Weakly and Self-Supervised Learning Part 1
In this video, we discuss weak supervision and demonstrate how to create class activation maps for localization and how to get from bounding boxes to pixel segmentations.
For reminders to watch the new video follow on Twitter or LinkedIn.
Further Reading:
A gentle Introduction to Deep Learning